Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
debakarr
GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 2 - Regression/Polynomial Regression/[Python] Polynomial Regression.ipynb
997 views
Kernel: Python 3

Polynomial Regression

from IPython.display import Image
Image('img/01.png')
Image in a Jupyter notebook
Image('img/02.png')
Image in a Jupyter notebook

Data Preprocessing

# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures %matplotlib inline plt.rcParams['figure.figsize'] = [14, 8] # Importing the dataset dataset = pd.read_csv('Position_Salaries.csv') X = dataset.iloc[:, 1:2].values y = dataset.iloc[:, 2].values
dataset

Problem Satement: So we are human resource team working for a big company and we are about to hire a new employee in this company so this new entry seems to be great a good fit for the job and we are about to make an offer to this potential new employee and now it's time to negotiate negotiate on what is going to be the future salary of this new employees in the company.

And so at the beginning of the negotiation to simpler is telling that he's had twenty plus years of experience and eventually earned 160K. annual salary in its previous company so this employee is asking for at least more than a 160K.

However there is someone in the H. R. team that is kind of a control freak and always fantasized about being a detective so suddenly decides to call the previous employer to check that info you know the info about the previous a 160K annual salary of this future potential new employee but unfortunately all the info that this person manages to get are these info here that is the symbol table of salaries for ten different positions in the previous company.

So there's a term member of the team runs a simple analysis on excel or Google sheets and actually observed that there is a non linear relationship between these position of old and their associated salaries.

However this HR person could get another very relevant info this all the relevant info is that this new employee has been a region manager for two years now and usually it takes on average four years to jump from being a region manager to a partner.

So this simply was kind of half way between level 6 and level 7 and therefore we can say he was level 6.5.

So now this HR guys getting all excited because he's selling to the team that he can build a blushing detector using regression models and predict if this new employees blushing about salary.

So at the beginning the team finds a little weird but it's kind of curious to see what's going to happen. And therefore here is the mission:

This new employee is telling that his annual salary was a 160K. Let's predict if it's truth or bluff by building a blushing detector using polynomial regression.


X
array([[ 1], [ 2], [ 3], [ 4], [ 5], [ 6], [ 7], [ 8], [ 9], [10]])
y
array([ 45000, 50000, 60000, 80000, 110000, 150000, 200000, 300000, 500000, 1000000])
plt.scatter(X, y) plt.title('Salary vs Level') plt.xlabel('Level') plt.ylabel('Salary')
Text(0,0.5,'Salary')
Image in a Jupyter notebook

We can see the non-linear relationship between the Salary and Level

Fitting Linear Regression to the dataset

lin_reg = LinearRegression() lin_reg.fit(X, y)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

Fitting Polynomial Regression to the dataset

poly_reg = PolynomialFeatures(degree = 2) X_poly = poly_reg.fit_transform(X) # New matrix of feature
X_poly
array([[ 1., 1., 1.], [ 1., 2., 4.], [ 1., 3., 9.], [ 1., 4., 16.], [ 1., 5., 25.], [ 1., 6., 36.], [ 1., 7., 49.], [ 1., 8., 64.], [ 1., 9., 81.], [ 1., 10., 100.]])
# Include fit with poly_reg lin_reg_2 = LinearRegression() lin_reg_2.fit(X_poly, y)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

Visualising the Linear Regression results

plt.scatter(X, y, c = 'red') plt.plot(X, lin_reg.predict(X), c = 'green') plt.title('Truth or Bluff (Linear Regression)') plt.xlabel('Level') plt.ylabel('Salary')
Text(0,0.5,'Salary')
Image in a Jupyter notebook

Visualising the Polynomial Regression results

plt.scatter(X, y, c = 'red') plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), c = 'green') plt.title('Truth or Bluff (Polynomial Regression)') plt.xlabel('Level') plt.ylabel('Salary')
Text(0,0.5,'Salary')
Image in a Jupyter notebook

Polynomial Regression model with degree 3

poly_reg = PolynomialFeatures(degree = 3) X_poly = poly_reg.fit_transform(X) # New matrix of feature
X_poly
array([[ 1., 1., 1., 1.], [ 1., 2., 4., 8.], [ 1., 3., 9., 27.], [ 1., 4., 16., 64.], [ 1., 5., 25., 125.], [ 1., 6., 36., 216.], [ 1., 7., 49., 343.], [ 1., 8., 64., 512.], [ 1., 9., 81., 729.], [ 1., 10., 100., 1000.]])
# Include fit with poly_reg lin_reg_2 = LinearRegression() lin_reg_2.fit(X_poly, y)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
plt.scatter(X, y, c = 'red') plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), c = 'green') plt.title('Truth or Bluff (Polynomial Regression)') plt.xlabel('Level') plt.ylabel('Salary')
Text(0,0.5,'Salary')
Image in a Jupyter notebook

Polynomial Regression model with degree 4

poly_reg = PolynomialFeatures(degree = 4) X_poly = poly_reg.fit_transform(X) # New matrix of feature
X_poly
array([[ 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00], [ 1.00000000e+00, 2.00000000e+00, 4.00000000e+00, 8.00000000e+00, 1.60000000e+01], [ 1.00000000e+00, 3.00000000e+00, 9.00000000e+00, 2.70000000e+01, 8.10000000e+01], [ 1.00000000e+00, 4.00000000e+00, 1.60000000e+01, 6.40000000e+01, 2.56000000e+02], [ 1.00000000e+00, 5.00000000e+00, 2.50000000e+01, 1.25000000e+02, 6.25000000e+02], [ 1.00000000e+00, 6.00000000e+00, 3.60000000e+01, 2.16000000e+02, 1.29600000e+03], [ 1.00000000e+00, 7.00000000e+00, 4.90000000e+01, 3.43000000e+02, 2.40100000e+03], [ 1.00000000e+00, 8.00000000e+00, 6.40000000e+01, 5.12000000e+02, 4.09600000e+03], [ 1.00000000e+00, 9.00000000e+00, 8.10000000e+01, 7.29000000e+02, 6.56100000e+03], [ 1.00000000e+00, 1.00000000e+01, 1.00000000e+02, 1.00000000e+03, 1.00000000e+04]])
# Include fit with poly_reg lin_reg_2 = LinearRegression() lin_reg_2.fit(X_poly, y)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
plt.scatter(X, y, c = 'red') plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), c = 'green') plt.title('Truth or Bluff (Polynomial Regression)') plt.xlabel('Level') plt.ylabel('Salary')
Text(0,0.5,'Salary')
Image in a Jupyter notebook

Get a more continuous curve

X_grid = np.arange(min(X), max(X), 0.1) X_grid = X_grid.reshape((len(X_grid)), 1)
X_grid[0:10, :] #Fist 10 elements
array([[ 1. ], [ 1.1], [ 1.2], [ 1.3], [ 1.4], [ 1.5], [ 1.6], [ 1.7], [ 1.8], [ 1.9]])

So X_grid contains 100 elements from 1 to 10 with step of 0.1

plt.scatter(X, y, c = 'red') plt.plot(X_grid, lin_reg_2.predict(poly_reg.fit_transform(X_grid)), c = 'green') plt.title('Truth or Bluff (Polynomial Regression)') plt.xlabel('Level') plt.ylabel('Salary')
Text(0,0.5,'Salary')
Image in a Jupyter notebook

Predicting a new result with Linear Regression

lin_reg.predict(6.5)
array([ 330378.78787879])

Predicting a new result with Polynomial Regression

lin_reg_2.predict(poly_reg.fit_transform(6.5))
array([ 158862.45265153])

Since the result is 158K which is approximately equal to 160K, therefore the new emplyoee is not bluffing and he is honest.